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5 HUMAN MutY 

BACKGROUND OF THE INVENTION 

The GO system includes 7,8-dihydro-8-oxoguanine, the structure of the 
10 predominant tautomeric form of the GO lesion. Oxidative damage can lead to GO 
lesions in DNA. MutY removes the misincorporated adenine from the A/GO mispairs 
that result from error-prone replication past the GO lesion. Repair polymerases are 
much less error-prone during trans lesion synthesis and can lead to a C/GO pair. 
Oxidative damage can also lead to 8-oxo-dGTP. Inaccurate replication could result in 
15 the misincorporation of 8-oxo-dGTP opposite template A residues, leading to A/GO 
mispairs. MutY could be involved in the mutation process because it is active on the 
A/GO substrate and would remove the template A, leading to the AT-> CG 
transversions that are characteristic of a MutT strain. The 8-oxo-dGTP could also be 
incorporated opposite template cytosines, resulting in a damaged C/GO pair that could 
20 be corrected by MutM . 

This invention relates to newly identified polynucleotides, polypeptides 
encoded by such polynucleotides, the use of such polynucleotides and polypeptides, as 
well as the production of such polynucleotides and polypeptides. More particularly, 
the polypeptide of the present invention has been putatively identified as a human 

25 homologue of the E. coli MutY gene, sometimes hereinafter referred to as "hMYH". 

Mismatches arise in DNA through DNA replication errors, through DNA 
recombination, and following exposure of DNA to deaminating or oxidating 
environments. Cells have a host of strategies that counter the threat to their genetic 
integrity from mismatched and chemically damaged base pairs (Friedberg, EC, DNA 

30 repair, W.H. Freeman, New York (1985)). With regard specifically to mismatch repair 
of replication errors, Escherichia coli and Salmonella typhimurium direct the repair to 
the unmethylated newly synthesized DNA strand by dam methylation at d(GATC) 
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sequences, using the MutHLS systems (Clavery, J.P. and Lacks, S.A., Microbiol., Rev. 
50:133-165 (1986); Modrich, P. Annu. Rev. Genet. 25:229-253 (1991); Radman, M. 
and R. Wagner, Annu. Rev. Genet. 20:523-528 (1986)). The very short patch pathway 
of E. coli is specific for a correction of T/G mismatches (a mismatch indicated by a 

5 slash) and is responsible for the correction of deaminated 5-methylcytosine (Jones, M., 
et al., Genetics. 115:605-610 (1987); Lieb, M.. Mod. Gen. Genet. 181:1 18-125 (1983); 
Lieb, M., and D. Read, Genetics 114:1041-1060 (1986); Raposa, S. and N.S. Fox, 
Genetics 117:381-390(1987)). 

The E. coli MutY pathway corrects A/G and A/C mismatches, as well as 

10 adenines paired with 7,8-dihydro-8-oxo-deoxyguanine (8-oxoG or GO) (Au, K.G.. et 
al. Proa Natl. Acad. Sci. USA 85:9163-9166 (1988); Lu, A.L. and D.Y. Chang, 
Genetics, 118:593-600 (1988); Michaels, M.L., et al, Proc. Natl. Acad. Sci. USA, 
89:7022-7025 (1992); Michaels, M.L., et al. Biochemistry, 31:10964-10968 (1992); 
Radicella, J.P., et al, Proc. Nad. Acad. Sci. USA, 85:9674-9678 (1988); Su. S.-S., et 

15 al., J. Biol. Chem. 263:6829-6835 1988). The 39-kDa MutY protein shares some 
homology with E. coli endonuclease m and contains a [4Fe-4S] 2 * cluster (Lu, A.-L., et 
al., 1994, Unpublished data; Michaels, M.L., et aL, Nucleic Acids Res. 18:3841-3845 
(1990); Tsai-Wu, J.-J., et al, Proc, Natl. Acad. Sci. USA 89:8779-8783 (1992); Tsai- 
Wu, J.-J., et al, J. Bacterid. 173:1902-1910 (1991)). The MutY preparation of Tsai- 

20 Wu et al. (Tsai-Wu, J.-J., et al, Proc. Natl. Acad. Sci. USA 89:8779-8783 (1992)) has 
both DNA N-glycosylase and apurinic or apyrimidinic (AP) endonuclease activities, 
whereas those purified by Au et al. (Au K.G., et al., Proc. Natl. Acad. Sci. USA, 
86:8877-8881 (1989), and Michaels et al (Michaels, M.L., et al, Proc. Natl. Acad. 
Sci. USA, 89:7022-7025 (1992); Michaels, M.L., et al. Biochemistry, 31:10964- 

25 10968 (1992) possess only the glycosylase activity. DNA glycosylase specifically 
excises the mispaired adenine from the mismatch and the AP endonuclease cleaves the 
first phosphodiester bond 3' to the resultant AP site (Au K.G., et al, Proc. Natl. Acad. 
Sci. USA, 86:8877-8881 (1989); Tsai-Wu, J.-J., et al, Proc, Natl. Acad. Sci. USA 
89:8779-8783(1992). 
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Repair by the MutY pathway involves a short repair tract and DNA polymerase 
I (Radicella, J.P., etaU J. Bacterid., 175:7732-7736 (1993); Tsai-Wu, J.-J., and A.-L. 
Lu, Mol. Gen. Genet. 244:444-450 (1994)). 

The mismatch repair strategy detailed above has been evolutionarily 
5 conserved. Genetic analysis suggests that Saccharomyces cerevisiae has a repair 
system analogous to the bacterial dam methylation-dependent pathway (Bishop, D.K., 
et a/., Nature (London) 243:362-364 (1987); Reenan, R.A. and R.D. Kolodner, 
Genetics, 132:963-973 (1992); Reenan, R.A. and R.D. Kolodner, Genetics, 132:975- 
985 (1992); Williamson, M., et a/., Genetics, 110:609-646 (1985)). This pathway is 

10 functionally homologous to the E. coli very sort patch pathway for the correction of 
deaminated 5-methylcytosine. 

Two mutator genes in £. coli, the mutY and the muxM genes (Cabrera et a/., J. 
BacterioL, 170:5405-5407 (1988); and Nghiem, Y., et al„ PNAS, USA, 85:9163-9166 
(1988)) have been described, which work together to prevent mutations from certain 

15 types of oxidative damage, dealing in particular with the oxidized guanine lesion, 8- 
oxodGuanine (Michaels et a/., PNAS, USA, 89:7022-7025 (1992). In Michaels, M.L., 
and Miller, J.H., J. BacterioL, 174:6321-6325 (1992) is a summary of the concerted 
action of these two enzymes, both of which are glycosylases. The MutM protein 
removes 8-oxodG from the DNA, and the resulting AP site is repaired to restore the 

20 G:C base pair. Some lesions are not repaired before replication, which results in 50% 
insertion of an A across from the 8-oxodG, which can lead to a G:C to T:A 
transversion at the next round of replication. However, the MutY protein removes the 
A across from 8-oxodG and repair synthesis restore a C most of the time, allowing the 
MutM protein another opportunity to repair the lesion. In accordance with this, 

25 mutators lacking either the MutM or MutY protein have an increase specifically in the 
G:C to T:A transversion (Cabrera etal y Id., (1988); and Nghiem, Y., et al., Id. (1988)), 
and cells lacking both enzymes have an enormous increase in this base substitution 
(Michaels et ai y Id. (1992). A third protein, the product of the mutT gene, prevents the 
incorporation of 8-oxodGTP by hydrolyzing the oxidized triphosphate back to the 
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monophosphate (Maki, H., and Sekiguchi, M., Nature, 355:273-275 (1992)), 
preventing A;T to C:G transversions. 

Accordingly, there exists a need in the art for identification and 
characterization of genes and proteins which modulate the human cellular mutation 
5 rate, for use as, among other things, markers in cancer and diseases associated with 
DNA repair. In particular, there is a need for isolating and characterizing human 
mismatch repair genes and proteins, which are essential to proper development and 
health of tissues and organs, such as the colon, and which can, among other things, 
play a role in preventing, ameliorating, diagnosing or correcting dysfunctions or 

10 disease, particularly cancer, and most particularly colon cancer, such as, for example, 
HNPCC (non-polyposis colon cancer). 

In accordance with one aspect of the present invention, there is provided a 
novel mature polypeptide, as well as biologically active and diagnostically or 
therapeutically useful fragments, analogs and derivatives thereof. The polypeptide of 

1 5 the present invention is of human origin. 

In accordance with another aspect of the present invention, there are provided 
isolated nucleic acid molecules encoding a polypeptide of the present invention 
including mRNAs, cDNAs, genomic DNAs as well as analogs and biologically active 
and diagnostically or therapeutically useful fragments thereof. 

20 In accordance with yet a further aspect of the present invention, there is 

provided a process for producing such polypeptide by recombinant techniques 
comprising culturing recombinant prokaryotic and/or eukaryotic host cells, containing 
a nucleic acid sequence encoding a polypeptide of the present invention, under 
conditions promoting expression of said protein and subsequent recovery of said 

25 protein. 

In accordance with yet a further aspect of the present invention, there is 
provided a process for utilizing such polypeptide, or polynucleotide encoding such 
polypeptide, for therapeutic purposes, for example, to repair oxidative damage to DNA 
and prevent mutations from oxidative lesions, treat genetic diseases related to a 
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mutated hMYH gene, for example, xeroderma pigmentosum and neoplasia, and to 
diagnose an abnormal transformation of cells, particularly cancer, and most 
particularly colon cancer, such as for example HNPCC, and/or to diagnose a 
susceptibility to abnormal transformation of cells, particularly cancer, and most 

5 particularly colon cancer, such as for example HNPCC. 

In accordance with yet a further aspect of the present invention, there are 
provided antibodies against such polypeptides. 

In accordance with yet a further aspect of the present invention, there is also 
provided nucleic acid probes comprising nucleic acid molecules of sufficient length to 

10 specifically hybridize to a nucleic acid sequence of the present invention. 

In accordance with still another aspect of the present invention, there are 
provided diagnostic assays for detecting diseases or susceptibility to diseases related to 
mutations in the nucleic acid sequences encoding a polypeptide of the present 
invention. In accordance with a further aspect of the invention is a process for 

15 diagnosing a cancer comprising determining from a sample derived from a patient a 
decreased level of activity of polypeptide having the sequence of SEQ ID NO: 2. 

In accordance with a further aspect of the invention is a process for diagnosing 
a cancer comprising determining from a sample derived from a patient a decreased 
level of expression of a gene encoding a polypeptide having the sequence of SEQ ID 

20 NO: 2. 

In accordance with a further aspect of the invention is a process for diagnosing 
a cancer comprising determining from a sample derived from a patient a decreased 
level of activity of polypeptide having the sequence of SEQ ID NO: 9. 

In accordance with a further aspect of the invention is a process for diagnosing 
25 a cancer comprising determining from a sample derived from a patient a decreased 
level of expression of a gene encoding a polynucleotide having the sequence of SEQ 
ID NO:9. 

In accordance with yet a further aspect of the present invention, there is 
provided a process for utilizing such polypeptides, or polynucleotides encoding such 
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polypeptides, for in vitro purposes related to scientific research, for example, synthesis 
of DNA and manufacture of DN A vectors. 

These and other aspects of the present invention should be apparent to those 
skilled in the art from the teachings herein. 
5 The following drawings are illustrative of embodiments of the invention and 

are not meant to limit the scope of the invention as encompassed by the claims. 

Figure 1 is an illustration of the cDNA and corresponding deduced amino acid 
sequence of the polypeptide of the present invention. The nucleotide sequence of 
hMYH is shown with the numbering relative to the A of the ATG translation start site 
10 (+1). The amino acid sequence is shown below in single letter code and is also 
numbered in the margin. 

Figure 2 is an amino acid sequence comparison between the polypeptide of the 
present invention (top line) and £. coli MutY protein (bottom line). 

In accordance with an aspect of the present invention, there is provided an 
15 isolated nucleic acid (polynucleotide) which encodes for the mature polypeptide 
having the deduced amino acid sequence of Figure 1 (SEQ ID NO;2). 

The polynucleotide of this invention may be obtained from numerous tissues of 
the human body, including brain and testes. The polynucleotide of this invention was 
discovered in a cDNA library derived from a human cerebellum. The hMYH gene 
20 contains 15 introns, and is 7.1 kb long. The 16 exons encode a nuclear protein of 535 
amino acids that displays 41% identity to the £. coli MutY protein, which provides an 
important function in the repair of oxidative damaged DNA, and helps to prevent 
mutations from oxidative lesions. The hMYH gene maps on the short arm of 
chromosome 1, between p32.1 and p34.3. There is extensive homology between the 
25 hMYH protein and the £. coli MutY protein with extensive homology near the 
beginning of the £. coli protein, which is characterized by a string of 14 identical 
amino acids. 

The polynucleotide of the present invention may be in the form of RNA or in 
the form of DNA, which DNA includes cDNA, genomic DNA, and synthetic DNA. 

6 
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The DNA may be double-stranded or single-stranded, and if single stranded may be 
the coding strand or non-coding (anti-sense) strand. The coding sequence which 
encodes the mature polypeptide may be identical to the coding sequence shown in 
Figure 1 (SEQ ID NO:l) or may be a different coding sequence which coding 
5 sequence, as a result of the redundancy or degeneracy of the genetic code, encodes the 
same mature polypeptide as the DNA of Figure 1 (SEQ ID NO: 1). 

The polynucleotide which encodes for the mature polypeptide of Figure 1 
(SEQ ID NO:2) may include, but is not limited to: only the coding sequence for the 
mature polypeptide; the coding sequence for the mature polypeptide and additional 

10 coding sequence such as a leader or secretory sequence or a proprotein sequence; the 
coding sequence for the mature polypeptide (and optionally additional coding 
sequence) and non-coding sequence, such as introns or non-coding sequence 5' and/or 
3* of the coding sequence for the mature polypeptide. 

Thus, the term "polynucleotide encoding a polypeptide" encompasses a 

15 polynucleotide which includes only coding sequence for the polypeptide as well as a 
polynucleotide which includes additional coding and/or non-coding sequence. 

The present invention further relates to variants of the hereinabove described 
polynucleotides which encode for fragments, analogs and derivatives of the 
polypeptide having the deduced amino acid sequence of Figure 1 (SEQ ID NO:2). 

20 The variant of the polynucleotide may be a naturally occurring allelic variant of the 
polynucleotide or a non-naturally occurring variant of the polynucleotide. 

Thus, the present invention includes polynucleotides encoding the same mature 
polypeptide as shown in Figure 1 (SEQ ID NO:2) as well as variants of such 
polynucleotides which variants encode for a fragment, derivative or analog of the 

25 polypeptide of Figure 1 (SEQ ID NO:2). Such nucleotide variants include deletion 
variants, substitution variants and addition or insertion variants. Certain specific 
variants, among other, are provided by the present invention, such as, an isolated 
nucleic acid having a cytosine (C) at position 366 and/or position 729 of the nucleotide 
sequence of Figure 1 (SEQ ID NO: 1). Certain other specific variants, among other, are 

7 
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provided by the present invention, such as, an isolated nucleic acid having a cytosine 
(C) at position 1095 of the nucleotide sequence of Figure 1 (SEQ ID NO: 1). Further 
specific variants include, but are not limited to, an isolated polypeptide sequence 
having a glutamine (Q) at position 365 of the amino acid sequence in Figure 1 (SEQ 
5 ID NO:2). 

As hereinabove indicated, the polynucleotide may have a coding sequence 
which is a naturally occurring allelic variant of the coding sequence shown in Figure 1 
(SEQ ID NO:l). As known in the art, an allelic variant is an alternate form of a 
polynucleotide sequence which may have a substitution, deletion or addition of one or 
10 more nucleotides, which does not substantially alter the function of the encoded 
polypeptide. 

The present invention also includes polynucleotides, wherein the coding 
sequence for the mature polypeptide may be fused in the same reading frame to a 
polynucleotide sequence which aids in expression and secretion of a polypeptide from 

15 a host cell, for example, a leader sequence which functions as a secretory sequence for 
controlling transport of a polypeptide from the cell. Thus, for example, the 
polynucleotide of the present invention may encode for a mature protein, or for a 
protein having a prosequence or for a protein having both a prosequence and a 
presequence (leader sequence). 

20 The polynucleotides of the present invention may also have the coding 

sequence fused in frame to a marker sequence which allows for purification of the 
polypeptide of the present invention. The marker sequence may be a hexa-histidine 
tag supplied by a pQE vector to provide for purification of the mature polypeptide 
fused to the marker in the case of a bacterial host, or, for example, the marker sequence 

25 may be a hemaglutinin (HA) tag when a mammalian host, e.g. COS-7 cells, is used. 
The HA tag corresponds to an epitope derived from the influenza hemaglutinin protein 
(Wilson, L, et al., Cell, 37:767 (1984)). 

The term "gene" means the segment of DNA involved in producing a 
polypeptide chain; it includes regions preceding and following the coding region 

8 
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(leader and trailer) as well as intervening sequences (introns) between individual 
coding segments (exons). 

Fragments of the full length gene of the present invention may be used as a 
hybridization probe for a cDNA library to isolate the full length cDNA and to isolate 
other cDNAs which have a high sequence similarity to the gene or similar biological 
activity. Probes of this type have at least 15 bases, preferably 30 bases and most 
preferably 50 or more bases. The probe may also be used to identify a cDNA clone 
corresponding to a full length transcript and a genomic clone or clones that contain the 
complete gene including regulatory and promotor regions, exons, and introns. An 
example of a screen comprises isolating the coding region of the gene by using the 
known DNA sequence to synthesize an oligonucleotide probe. Labeled 
oligonucleotides having a sequence complementary to that of the gene of the present 
invention are used to screen a library of human cDNA, genomic DNA or mRNA to 
determine which members of the library the probe hybridizes to. 
1 5 The present invention further relates to polynucleotides which hybridize to the 

hereinabove-described sequences if there is at least 70%, preferably at least 90%, and 
more preferably at least 95% identity between the sequences. The present invention 
particularly relates to polynucleotides which hybridize under stringent conditions to 
the hereinabove-described polynucleotides. As herein used, the term "stringent 
20 conditions" means hybridization will occur only if there is at least 95% and preferably 
at least 97% identity between the sequences. The polynucleotides which hybridize to 
the hereinabove described polynucleotides in a preferred embodiment encode 
polypeptides which either retain substantially the same biological function or activity 
as the mature polypeptide encoded by the cDNAs of Figure 1 (SEQ ID NO: 1 ). 
25 Alternatively, the polynucleotide may have at least 1 5 bases, preferably at least 

30 bases, and more preferably at least 50 bases which hybridize to a polynucleotide of 
the present invention and which has an identity thereto, as hereinabove described, and 
which may or may not retain activity. For example, such polynucleotides may be 
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employed as probes for the polynucleotide of SEQ ID NO: 1 , for example, for recovery 
of the polynucleotide or as a diagnostic probe or as a PCR primer. 

Thus, the present invention is directed to polynucleotides having at least a 70% 
identity, preferably at least 90% and more preferably at least a 95% identity to a 
5 polynucleotide which encodes the polypeptide of SEQ ID NO:2 and polynucleotides 
complementary thereto as well as portions thereof, which portions have at least 15 
consecutive bases, preferably 30 consecutive bases and most preferably at least 50 
consecutive bases and to polypeptides encoded by such polynucleotides. 

The present invention further relates to a polypeptide which has the deduced 
10 amino acid sequence of Figure 1 (SEQ ID NO:2), as well as fragments, analogs and 
derivatives of such polypeptide. 

The terms "fragment." '•derivative" and "analog" when referring to the 
polypeptide of Figure 1 (SEQ ID NO:2) means a polypeptide which retains essentially 
the same biological function or activity as such polypeptide. Thus, an analog includes 
15 a proprotein which can be activated by cleavage of the proprotein portion to produce 
an active mature polypeptide. 

The polypeptide of the present invention may be a recombinant polypeptide, a 
natural polypeptide or a synthetic polypeptide, preferably a recombinant polypeptide. 
The fragment, derivative or analog of the polypeptide of Figure 1 (SEQ ID 
20 NO:2) may be (i) one in which one or more of the amino acid residues are substituted 
with a conserved or non-conserved amino acid residue (preferably a conserved amino 
acid residue) and such substituted amino acid residue may or may not be one encoded 
by the genetic code, or (ii) one in which one or more of the amino acid residues 
includes a substituent group, or (iii) one in which the mature polypeptide is fused with 
25 another compound, such as a compound to increase the half-life of the polypeptide (for 
example, polyethylene glycol), or (iv) one in which the additional amino acids are 
fused to the mature polypeptide, such as a leader or secretory sequence or a sequence 
which is employed for purification of the mature polypeptide or a proprotein sequence. 



10 
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Such fragments, derivatives and analogs are deemed to be within the scope of those 
skilled in the art from the teachings herein. 

The polypeptides and polynucleotides of the present invention are preferably 
provided in an isolated form, and preferably are purified to homogeneity. 

The term "isolated" means that the material is removed from its original 
environment (e.g., the natural environment if it is naturally occurring). For example, a 
naturally-occurring polynucleotide or polypeptide present in a living animal is not 
isolated, but the same polynucleotide or polypeptide, separated from some or all of the 
coexisting materials in the natural system, is isolated. Such polynucleotides could be 
part of a vector and/or such polynucleotides or polypeptides could be part of a 
composition, and still be isolated in that such vector or composition is not part of its 
natural environment. 

The polypeptides of the present invention include the polypeptide of SEQ ID 
NO:2 (in particular the mature polypeptide) as well as polypeptides which have at least 
15 70% similarity (preferably at least 70% identity) to the polypeptide of SEQ ID NO:2 
and more preferably at least 90% similarity (more preferably at least 90% identity) to 
the polypeptide of SEQ ID NO:2 and still more preferably at least 95% similarity (still 
more preferably at least 95% identity) to the polypeptide of SEQ ID NO:2 and also 
include portions of such polypeptides with such portion of the polypeptide generally 
20 containing at least 30 amino acids and more preferably at least 50 amino acids. 

As known in the art "similarity" between two polypeptides is determined by 
comparing the amino acid sequence and its conserved amino acid substitutes of one 
polypeptide to the sequence of a second polypeptide. Moreover, also known in the art 
is "identity" which means the degree of sequence relatedness between two polypptide 
25 or two polynucleotides sequences as determined by the identity of the match 

between two strings of such sequences. Both identity and similarity can be readily 
calculated. While there exist a number of methods to measure identity and similarity 
between two polynucleotide or polypeptide sequences, the terms "identity" and 
"similarity" are well known to skilled artisans (Carillo, H., and Lipman, D., SIAM 
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J. Applied Math., 48: 1073(1988). Methods commonly employed to determine 
identity or similarity between two sequences include, but are not limited to disclosed 
in Guide to Huge Computers. Martin J. Bishop, ed., Academic Press, San Diego, 
1994, and Carillo, H., and Lipman, D., SIAM J. Applied Math., 48: 1073 (1988). 
Preferred methods to determine identity are designed to give the largest match 
between the two sequences tested. Methods to determine identity and similarity are 
codified in computer programs. Preferred computer program methods to determine 
identity and similarity between two sequences include, but are not limited to, 
BLASTP, BLASTN. FASTA. 

Fragments or portions of the polypeptides of the present invention may be 
employed for producing the corresponding full-length polypeptide by peptide 
synthesis; therefore, the fragments may be employed as intermediates for producing 
the full-length polypeptides. Fragments or portions of the polynucleotides of the 
present invention may be used to synthesize full-length polynucleotides of the present 
15 invention. 

The present invention also relates to vectors which include polynucleotides of 
the present invention, host cells which are genetically engineered with vectors of the 
invention and the production of polypeptides of the invention by recombinant 
techniques. 

Host cells are genetically engineered (transduced or transformed or transfected) 
with the vectors of this invention which may be, for example, a cloning vector or an 
expression vector. The vector may be, for example, in the form of a plasmid. a viral 
particle, a phage, etc. The engineered host cells can be cultured in conventional 
nutrient media modified as appropriate for activating promoters, selecting 
25 transformants or amplifying the genes of the present invention. The culture conditions, 
such as temperature, pH and the like, are those previously used with the host cell 
selected for expression, and will be apparent to the ordinarily skilled artisan. 

The polynucleotides of the present invention may be employed for producing 
polypeptides by recombinant techniques. Thus, for example, the polynucleotide may 
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be included in any one of a variety of expression vectors for expressing a polypeptide. 
Such vectors include chromosomal, nonchromosomal and synthetic DNA sequences, 
e.g., derivatives of SV40; bacterial plasmids; phage DNA; baculovirus; yeast plasmids; 
vectors derived from combinations of plasmids and phage DNA, viral DNA such as 
5 vaccinia, adenovirus, fowl pox virus, and pseudorabies. However, any other vector 
may be used as long as it is replicable and viable in the host. 

The appropriate DNA sequence may be inserted into the vector by a variety of 
procedures. In general, the DNA sequence is inserted into an appropriate restriction 
endonuclease site(s) by procedures known in the art. Such procedures and others are 
10 deemed to be within the scope of those skilled in the art. 

The DNA sequence in the expression vector is operatively linked to an 
appropriate expression control sequenced) (promoter) to direct mRNA synthesis. As 
representative examples of such promoters, there may be mentioned: LTR or SV40 
promoter, the E^jsoJL lac or trp, the phage lambda P L promoter and other promoters 
15 known to control expression of genes in prokaryouc or eukaryotic cells or their 
viruses. The expression vector also contains a ribosome binding site for translation 
initiation and a transcription terminator. The vector may also include appropriate 
sequences for amplifying expression. 

In addition, the expression vectors preferably contain one or more selectable 
20 marker genes to provide a phenotypic trait for selection of transformed host cells such 
as dihydrofolate reductase or neomycin resistance for eukaryouc cell culture, or such 
as tetracycline or ampicillin resistance in E. coli . 

The vector containing the appropriate DNA sequence as hereinabove 
described, as well as an appropriate promoter or control sequence, may be employed to 
25 transform an appropriate host to permit the host to express the protein. 

As representative examples of appropriate hosts, there may be mentioned: 
bacterial cells, such as E^£oJi, Streptomyces, S^mgjjeJk tvphimurium : fungal cells, 
such as yeast; insect cells such as Prpspphila£2 and Spodoptera ££; animal cells such 
as CHO, COS or Bowes melanoma; adenoviruses; plant cells, etc. The selection of an 
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appropriate host is deemed to be within the scope of those skilled in the art from the 
teachings herein. 

More particularly, the present invention also includes recombinant constructs 
comprising one or more of the sequences as broadly described above. The constructs 
5 comprise a vector, such as a plasmid or viral vector, into which a sequence of the 
invention has been inserted, in a forward or reverse orientation. In a preferred aspect of 
this embodiment, the construct further comprises regulatory sequences, including, for 
example, a promoter, operably linked to the sequence. Large numbers of suitable 
vectors and promoters are known to those of skill in the art, and are commercially 
10 available. The following vectors are provided by way of example; Bacterial: pQE70, 
pQE60, pQE-9 (Qiagen), pBS, pDIO, phagescript, psiX174, pBluescript SK, pBSKS, 
pNH8A, pNH16a, pNH18A, pNH46A (Stratagene); pTRC99a, pKK223-3, pKK233-3, 
pDR540, pRTTS (Pharmacia); Eukaryotic: pWLNEO, pSV2CAT, pOG44, pXTl, pSG 
(Stratagene) pSVK3, pBPV, pMSG, pSVL (Pharmacia). However, any other plasmid 
1 5 or vector may be used as long as they are replicable and viable in the host. 

Promoter regions can be selected from any desired gene using CAT 
(chloramphenicol transferase) vectors or other vectors with selectable markers. Two 
appropriate vectors are pKK232-8 and pCM7. Particular named bacterial promoters 
include lacl, lacZ, T3, T7, gpt, lambda P R , P L and trp. Eukaryotic promoters include 
20 CMV immediate early, HSV thymidine kinase, early and late SV40, LTRs from 
retrovirus, and mouse metallothionein-I. Selection of the appropriate vector and 
promoter is well within the level of ordinary skill in the art. 

In a further embodiment, the present invention relates to host cells containing 
the above-described constructs. The host cell can be a higher eukaryotic cell, such as a 
25 mammalian cell, or a lower eukaryotic cell, such as a yeast cell, or the host cell can be 
a prokaryotic ceil, such as a bacterial cell. Introduction of the construct into the host 
cell can be effected by calcium phosphate transfection, DEAE-Dextran mediated 
transfection, or electroporation (Davis, L., Dibner, M., Battey, I., Basic Methods in 
Molecular Biology, (1986)). 
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The constructs in host cells can be used in a conventional manner to produce 
the gene product encoded by the recombinant sequence. Alternatively, the 
polypeptides of the invention can be synthetically produced by conventional peptide 
synthesizers. 

5 Mature proteins can be expressed in mammalian cells, yeast, bacteria, or other 

cells under the control of appropriate promoters. Cell-free translation systems can also 
be employed to produce such proteins using RNAs derived from the DNA constructs 
of the present invention. Appropriate cloning and expression vectors for use with 
prokaryotic and eukaryotic hosts are described by Sambrook, et al., Molecular 

10 Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor, N.Y., (1989), 
the disclosure of which is hereby incorporated by reference. 

Transcription of the DNA encoding the polypeptides of the present invention 
by higher eukaryotes is increased by inserting an enhancer sequence into the vector. 
Enhancers are cis-acting elements of DNA, usually about from 10 to 300 bp that act on 

15 a promoter to increase its transcription. Examples include the SV40 enhancer on the 
late side of the replication origin bp 100 to 270, a cytomegalovirus early promoter 
enhancer, the polyoma enhancer on the late side of the replication origin, and 
adenovirus enhancers. 

Generally, recombinant expression vectors will include origins of replication 

20 and selectable markers permitting transformation of the host cell, e.g., the ampicillin 
resistance gene of E, gpli and S. cerevisiae TRPI gene, and a promoter derived from a 
highly-expressed gene to direct transcription of a downstream structural sequence. 
Such promoters can be derived from operons encoding glycolytic enzymes such as 3- 
phosphoglycerate kinase (PGK), a-factor, acid phosphatase, or heat shock proteins, 

25 among others. The heterologous structural sequence is assembled in appropriate phase 
with translation initiation and termination sequences, and preferably, a leader sequence 
capable of directing secretion of translated protein into the periplasmic space or 
extracellular medium. Optionally, the heterologous sequence can encode a fusion 
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protein including an N-terminal identification peptide imparting desired character- 
istics, e.g.. stabilization or simplified purification of expressed recombinant product. 

Useful expression vectors for bacterial use are constructed by inserting a 
structural DNA sequence encoding a desired protein together with suitable translation 
initiation and termination signals in operable reading phase with a functional promoter. 
The vector will comprise one or more phenotypic selectable markers and an origin of 
replication to ensure maintenance of the vector and to, if desirable, provide 
amplification within the host. Suitable prokaryotic hosts for transformation include E, 
£2li, Bac ill us swhtilfo , Salmonella tvphin^nHrn and various species within the genera 
Pseudomonas, Streptomyces, and Staphylococcus, although others may also be 
employed as a matter of choice. 

As a representative but nonlimiting example, useful expression vectors for 
bacterial use can comprise a selectable marker and bacterial origin of replication 
derived from commercially available plasmids comprising genetic elements of the well 
1 5 known cloning vector pBR322 (ATCC 37017). Such commercial vectors include, for 
example, pKK223-3 (Pharmacia Fine Chemicals, Uppsala, Sweden) and GEM1 
(Promega Biotec, Madison, WI, USA). These pBR322 "backbone" sections are 
combined with an appropriate promoter and the structural sequence to be expressed. 

Following transformation of a suitable host strain and growth of the host strain 
20 to an appropriate cell density, the selected promoter is induced by appropriate means 
(e.g., temperature shift or chemical induction) and cells are cultured for an additional 
period. 

Cells are typically harvested by centrifugation, disrupted by physical or 
chemical means, and the resulting crude extract retained for further purification. 
25 Microbial cells employed in expression of proteins can be disrupted by any 

convenient method, including freeze-thaw cycling, sonication, mechanical disruption, 
or use of cell lysing agents, such methods are well known to those skilled in the art. 

Various mammalian cell culture systems can also be employed to express 
recombinant protein. Examples of mammalian expression systems include the COS-7 
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lines of monkey kidney fibroblasts, described by Gluzman, Cell, 23:175 (1981), and 
other cell lines capable of expressing a compatible vector, for example, the CI 27, 3T3, 
CHO, HeLa and BHK cell lines. Mammalian expression vectors will comprise an 
origin of replication, a suitable promoter and enhancer, and also any necessary 

5 ribosome binding sites, polyadenylation site, splice donor and acceptor sites, 
transcriptional termination sequences, and 5' flanking nontranscribed sequences. DNA 
sequences derived from the SV40 splice, and polyadenylation sites may be used to 
provide the required nontranscribed genetic elements. 

The polypeptide can be recovered and purified from recombinant cell cultures 

10 by methods including ammonium sulfate or ethanol precipitation, acid extraction, 
anion or cation exchange chromatography, phosphocellulose chromatography, 
hydrophobic interaction chromatography, affinity chromatography, hydroxylapatite 
chromatography and lectin chromatography. Protein refolding steps can be used, as 
necessary, in completing configuration of the mature protein. Finally, high 

15 performance liquid chromatography (HPLC) can be employed for final purification 
steps. 

The polypeptides of the present invention may be a naturally purified product, 
or a product of chemical synthetic procedures, or produced by recombinant techniques 
from a prokaryotic or eukaryotic host (for example, by bacterial, yeast, higher plant, 

20 insect and mammalian cells in culture). Depending upon the host employed in a 
recombinant production procedure, the polypeptides of the present invention may be 
glycosylated or may be non-glycosylated. Polypeptides of the invention may also 
include an initial methionine amino acid residue. 

The fragments, analogs and derivatives of the polypeptides of the present 

25 invention may be assayed for determination of mismatch-nicking activity and 
glycosylase activity. As an example of such an assay, protein samples are incubated 
with 1.8 fmol of either a 5'-end-labeled 116-mer, a ^-end-labeled 120-mer, or a 3'- 
end-labeled 20-mer duplex DNA containing mismatches (see Yeh, Y.-C, et a/., 1991, 
J. Bio. Chem. 266:6480-6486); (Roelen, H.C.P.F., et aL 1991, Nucleic Acids Res. 
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19:4361-4369) in a 20 jil reaction mixture containing 10 mM Tris-HCl (pH 7.6), 5 \xM 
ZnCl 2 , 0.5 mM DTT, 0.5 mM EDTA, and 1.5% glycerol. Following a 2 hour 
incubation at 37°C, the reaction products are lyophilized and dissolved in a solution 
containing 3 nJ of 90% (vol/vol) formamide, 10 mM EDTA, 0.1% (wt/vol) xylene 

5 cyanol, and 0.1% (wt/vol) bromophenol blue. After heating at 90°C for 3 minutes, 
DNA samples are analyzed on 8% polyacrylamide-8.3 M urea DNA sequencing gels 
(Maxam, A.M. and W. Gilbert, 1980, Methods Enzymol., 65:499-560), and the gel 
was then autoradiographed. The DNA glycosylase activity was monitored by adding 
piperidine, after the enzyme incubation, to a final concentration of 1 M. After 30 

10 minutes of incubation at 90°C, the reaction products are analyzed as described above. 

An Enzyme binding assay may also be performed wherein protein-DNA 
complexes are analyzed on 4% polyacrylamide gels in 50 mM Tris-borate (pH 8.3) 
and I mM EDTA. Protein samples are incubated with 3'-end-labeled 20-bp 
oligonucleotides as in the nicking assay, except 20 ng or poly(dl-dC) is added to each 

15 reaction mixture. Bovine serine albumin (1 \ig) is added as indicated to the binding 
assay. For the binding competition assay, in addition to the 1.8 fmol of labeled 20-mer 
substrates, unlabeled 19-mer DNAs containing A/G, A/GO, or C G pairings are added 
in excess of up to 180 fmol. 

The invention provides a process for diagnosing a disease, particularly 

20 cancer, comprising determining from a sample derived from a patient a decreased 
level of activity of polypeptide having the sequence of Figure 1 (SEQ ID NO: 2). 
Decreased activity may be readily measured by one skilled in the art, for example 
determining the presence of an amino acid variation from the the sequence in Figure 
1 (SEQ ID NO: 2) followed by using the aforementioned enzyme binding assay or 

25 by measurement mismatch-nicking activity and glycosylase activity. The invention 
also provides a process for diagnosing a cancer comprising determining from a 
sample derived from a patient a decreased level of expression of polypeptide having 
the sequence of Figure 1 (SEQ ID NO: 2). Decresed protein expression can be 
measured using, on known quantities of protein, the aforementioned enzyme binding 
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assay or by measurement mismatch-nicking activity and glycosyiase activity and 
comparing these activities to known quantities of non-variant hMYH polypeptide. 

The hMYH polypeptides may also be employed in accordance with the present 
invention by expression of such polypeptides in vivo, which is often referred to as 

5 "gene therapy." 

Thus, for example, cells from a patient may be engineered with a 
polynucleotide (DNA or RNA) encoding a polypeptide ex vivo, with the engineered 
cells then being provided to a patient to be treated with the polypeptide. Such methods 
are well-known in the art and are apparent from the teachings herein. For example, 

10 cells may be engineered by the use of a retroviral plasmid vector containing RNA 
encoding a polypeptide of the present invention. 

Similarly, cells may be engineered in vivo for expression of a polypeptide in 
vivo by, for example, procedures known in the art. For example, a packaging cell is 
transduced with a retroviral plasmid vector containing RNA encoding a polypeptide of 

15 the present invention such that the packaging cell now produces infectious viral 
particles containing the gene of interest. These producer cells may be administered to 
a patient for engineering cells in vivo and expression of the polypeptide in vivo. These 
and other methods for administering a polypeptide of the present invention by such 
method should be apparent to those skilled in the art from the teachings of the present 

20 invention. 

Retroviruses from which the retroviral plasmid vectors hereinabove mentioned 
may be derived include, but are not limited to, Moloney Murine Leukemia Virus, 
spleen necrosis virus, retroviruses such as Rous Sarcoma Virus, Harvey Sarcoma 
Virus, avian leukosis virus, gibbon ape leukemia virus, human immunodeficiency 
25 virus, adenovirus, Myeloproliferative Sarcoma Virus, and mammary tumor virus. In 
one embodiment, the retroviral plasmid vector is derived from Moloney Murine 
Leukemia Virus. 

The vector includes one or more promoters. Suitable promoters which may be 
employed include, but are not limited to, the retroviral LTR; the SV40 promoter; and 
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the human cytomegalovirus (CMV) promoter described in Miller, et al., 
BiWChniqws, Vol. 7, No. 9, 980-990 (1989), or any other promoter (e.g., cellular 
promoters such as eukaryotic cellular promoters including, but not limited to, the 
histone, pol m, and P-actin promoters). Other viral promoters which may be 

5 employed include, but are not limited to, adenovirus promoters, thymidine kinase (TK) 
promoters, and B 19 parvovirus promoters. The selection of a suitable promoter will be 
apparent to those skilled in the art from the teachings contained herein. 

The nucleic acid sequence encoding the polypeptide of the present invention is 
under the control of a suitable promoter. Suitable promoters which may be employed 

10 include, but are not limited to, adenoviral promoters, such as the adenoviral major late 
promoter; or heterologous promoters, such as the cytomegalovirus (CMV) promoter; 
the respiratory syncytial virus (RSV) promoter; inducible promoters, such as the MMT 
promoter, the metallothionein promoter, heat shock promoters; the albumin promoter; 
the ApoAI promoter; human globin promoters; viral thymidine kinase promoters, such 

15 as the Herpes Simplex thymidine kinase promoter; retroviral LTRs (including the 
modified retroviral LTRs hereinabove described); the p-actin promoter; and human 
growth hormone promoters. The promoter also may be the native promoter which 
controls the gene encoding the polypeptide. 

The retroviral plasmid vector is employed to transduce packaging cell lines to 

20 form producer cell lines. Examples of packaging cells which may be transfected 
include, but are not limited to, the PE501, PA317, y-2, y-AM, PA12, T19-14X, VT- 
19-17-H2, yCRE, yCRIP, GP+E-86, GP-f«nvAml2, and DAN cell lines as described 
in Miller, Human Gene Therapy Vol. 1, pgs. 5-14 (1990), which is incorporated 
herein by reference in its entirety. The vector may transduce the packaging cells 

25 through any means known in the art. Such means include, but are not limited to, 
electroporation, the use of liposomes, and CaP0 4 precipitation. In one alternative, the 
retroviral plasmid vector may be encapsulated into a liposome, or coupled to a lipid, 
and then administered to a host. 

20 
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The producer cell line generates infectious retroviral vector particles which 
include the nucleic acid sequence(s) encoding the polypeptides. Such retroviral vector 
particles then may be employed, to transduce eukaryotic cells, either in vitro or in vivo. 
The transduced eukaryotic cells will express the nucleic acid sequence(s) encoding the 
5 polypeptide. Eukaryotic cells which may be transduced include, but are not limited to, 
embryonic stem cells, embiyonic carcinoma cells, as well as hematopoietic stem cells, 
hepatocytes, fibroblasts, myoblasts, keratinocytes, endothelial cells, and bronchial 
epithelial cells. 

Once the hMYH gene is being expressed intracellularly, it may be employed to 

10 repair DNA mismatches and therefore, prevent cells from uncontrolled growth and 
neoplasia such as occurs in cancer and tumors. 

The hMYH gene and gene product of the present invention may be employed 
to treat patients who have a defect in the hMYH gene. Among the disorders which 
may be treated in such cases is cancer, and most particularly colon cancer, such as for 

1 5 example HNPCC, as well as xeroderma pigmentosum. 

hMYH may also be employed to repair oxidative damage to and oxidation of 
DNA and prevent mutations from oxidative lesions and other modifications of DNA 
that can be repaired by hMYH. Skilled artisans will be able to use the DNA repair 
assays of the invention to determine which defects and/or modifications of DNA can 

20 be repaired by hMYH. 

In accordance with a further aspect of the invention, there is provided a process 
for determining susceptibility to cancer, and particularly colon cancer, and most 
particularly HNPCC. Thus, a mutation in hMYH indicates a susceptibility to cancer, 
and the nucleic acid sequences described above may be employed in an assay for 

25 ascertaining such susceptibility. Thus, for example, the assay may be employed to 
determine a mutation in a human DNA repair protein as herein described, such as a 
deletion, truncation, insertion, frame shift, etc., with such mutation being indicative of 
a susceptibility to cancer. 
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A mutation may be ascertained for example, by a DNA sequencing assay. 
Tissue samples, including but not limited to blood samples are obtained from a human 
patient. The samples are processed by methods known in the art to capture the RNA. 
First strand cDNA is synthesized from the RNA samples by adding an oligonucleotide 

5 primer consisting of polythymidine residues which hybridize to the polyadenosine 
stretch present on the mRNA's. Reverse transcriptase and deoxynucleotides are added 
to allow synthesis of the first strand cDNA. Primer sequences are synthesized based 
on the DNA sequence of the DNA repair protein of the invention. The primer 
sequence is generally comprised of at least 15 consecutive bases, and may contain at 

1 0 least 30 or even 50 consecutive bases. 

Individuals carrying mutations in the gene of the present invention may also be 
detected at the DNA level by a variety of techniques. Nucleic acids for diagnosis may 
be obtained from a patient's cells, including but not limited to blood, urine, saliva, 
tissue biopsy and autopsy material. The genomic DNA may be used directly for 

15 detection or may be amplified enzymatically by using PCR (Saiki et a/., Nature, 
324:163-166 (1986)) prior to analysis. RT-PCR can also be used to detect mutations. 
It is particularly preferred to used RT-PCR in conjunction with automated detection 
systems, such as, for example, GeneScan. RNA or cDNA may also be used for the 
same purpose, PCR or RT-PCR. As an example, PCR primers complementary to the 

20 nucleic acid encoding hMYH can be used to identify and analyze mutations. 
Examples of representative primers are shown below in Table 1. For example, 
deletions and insertions can be detected by a change in size of the amplified product in 
comparison to the normal genotype. Point mutations can be identified by hybridizing 
amplified DNA to radiolabeled RNA or alternatively, radiolabeled antisense DNA 

25 sequences. Perfectly matched sequences can be distinguished from mismatched 
duplexes by RNase A digestion or by differences in melting temperatures. 
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Table 1 

Primers used for detection of mutations 
in hMYH gene 

SEQ ID NO: 

5 



1 


5' TCCTCTGAAGCTTGAGGAGCCTCTAGAACT 3' 


10 


4m 


5' TAGCTCCATGGCTGCTTGGTTGAAA 3' 


11 


3 


5' GCCATCATGAGGAAGCCACGAGCAG 3' 


12 


4 


5' TAGCTCCATGGCTGCTTGGTTGAAA 3" 


13 


5 


5' TTGACCCGAAACTGCTGAATAG 3' 


14 


6 


5' CAGTGGAGATGTGAGACCGAAAGAA 3 1 


15 


7 


5' CAGCCCGGCCAGGAGATTTCAACCA 3* 


16 


8 


5' CAGTGGAGATGTGAGACCGAAAGAA 3' 


17 


9 


5' CCCTCACTAAAGGGAACAAAAGCTGG 3' 


18 



15 

The above primers may be used for amplifying hMYH cDNA isolated from a 
sample derived from a patient. The invention also provides the primers of Table 1 
with 1, 2, 3 or 4 nucleotides removed from the 5' and/or the 3' end. The primers may 
be used to amplify the gene isolated from the patient such that the gene may then be 

20 subject to various techniques for elucidation of the DNA sequence. In this way, 
mutations in the DNA sequence may be diagnosed. 

Sequence differences between the reference gene and genes having mutations 
may be revealed by the direct DNA sequencing method. In addition, cloned DNA 
segments may be employed as probes to detect specific DNA segments. The 

25 sensitivity of this method is greatly enhanced when combined with PCR. For 
example, a sequencing primer is used with double-stranded PCR product or a single- 
stranded template molecule generated by a modified PCR. The sequence 
determination is performed by conventional procedures with radiolabeled nucleotide or 
by automatic sequencing procedures with fluorescent-tags. 
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Genetic testing based on DNA sequence differences may be achieved by 
detection of alteration in electrophoretic mobility of DNA fragments in gels with or 
without denaturing agents. Small sequence deletions and insertions can be visualized 
by high resolution gel electrophoresis. DNA fragments of different sequences may be 
5 distinguished on denaturing formamide gradient gels in which the mobilities of 
different DNA fragments are retarded in the gel at different positions according to their 
specific melting or partial melting temperatures (see, e.g., Myers et al y Science, 
230:1242(1985)). 

Sequence changes at specific locations may also be revealed by nuclease 

10 protection assays, such as RNase and SI protection or the chemical cleavage method 
(e.g., Cotton etal y PNAS, USA, 85:4397-4401 (1985)). 

Thus, the detection of a specific DNA sequence and/or quantitation of the level 
of the sequence may be achieved by methods such as hybridization, RNase protection, 
chemical cleavage, direct DNA sequencing or the use of restriction enzymes, (e.g., 

15 Restriction Fragment Length Polymorphisms (RFLP)) and Southern blotting of 
genomic DNA. The invention provides a process for diagnosing, disease, particularly 
a cancer, and most particularly colon cancer, such as for example HNPCC, comprising 
determining from a sample derived from a patient a decreased level of expression of 
polynucleotide having the sequence of Figure 1 (SEQ ID NO: 1). Decreased 

20 expression of polynucleotide can be measured using any on of the methods well 
known in the art for the quantation of polynucleotides, such as, for example, PCR, 
RT-PCR, RNase protection, Northern blotting and other hybridization methods. 

In addition to more conventional gel-electrophoresis and DNA sequencing, 
mutations can also be detected by in situ analysis. 

25 Fluorescence in situ hybridization (FISH) of a cDNA clone to a metaphase 

chromosomal spread can be used to provide a precise chromosomal location. 

As an example of how this was performed, hMYH DNA was digested and 
purified with QIAEX II DNA purification kit (QIAGEN, Inc., Chatsworth, CA) and 
ligated to Super Cosl cosmid vector (STRATAGENE, La Jolia, CA). DNA was 

24 



BNSDOCID: <WO. 9733903A1 . 1_> 



WO 97/33903 



PCT/US96/03239 



purified using Qiagen Plasmid Purification Kit (QIAGEN Inc., Chatsworth, CA) and 1 
mg was labeled by nick translation in the presence of Biotin-dATP using BioNick 
Labeling Kit (GibcoBRL, Life Technologies Inc., Gaithersburg, MD). Biotinilation 
was detected with GENE-TECT Detection System (CLONTECH Laboratories, Inc. 
5 Palo Alto, CA). In situ Hybridization was performed on slides using ONCOR Light 
Hybridization Kit (ONCOR, Gaithersberg, MD) to detect single copy sequences on 
metaphase chromosomes. Peripheral blood of normal donors was cultured for three 
days in RPMI 1640 supplemented with 20% FCS, 3% PHA and penicillin/ 
streptomycin, synchronized with 10 7 M methotrexate for 17 hours and washed twice 
10 with unsupplemented RPMI. Cells were incubated with 10 3 M thymidine for 7 hours. 
The cells were arrested in metaphase after 20 minutes incubation with colcemid (0.5 
lig/ml) followed by hypotonic lysis in 75 mM KC1 for 15 minutes at 37°C. Cell 
pellets were then spun out and fixed in Carnoy s fixative (3: 1 methanol/acetic acid). 

Metaphase spreads were prepared by adding a drop of the suspension onto 
15 slides and aid dried. Hybridization was performed by adding 100 ng of probe 
suspended in 10 ml of hybridization mix (50% formamide, 2xSSC, 1% dextran 
sulfate) with blocking human placental DNA 1 |ig/ml), Probe mixture was denatured 
for 10 minutes in 70°C water bath and incubated for 1 hour at 37°C, before placing on 
a prewarmed (37°C) slide, which was previously denatured in 70% formamide/2xSSC 
20 at 70°C, and dehydrated in ethanol series, chilled to 4°C 

Slides were incubated for 16 hours at 37°C in a humidified chamber. Slides 
were washed in 50% formamide/2xSSC for 10 minutes at 41°C and 2xSSC for 7 
minutes at 37°C. Hybridization probe was detected by incubation of the slides with 
FITC-Avidin (ONCOR, Gaithersberg, MD), according to the manufacturer protocol. 
25 Chromosomes were counterstained with propridium iodine suspended in mounting 
medium. Slides were visualized using a Leitz ORTHOPLAN 2-epifluorescence 
microscope and five computer images were taken using Imagenetics Computer and 
Macintosh printer. hMYH maps to the short arm of chromosome L between p32.1 
and p34.3. 
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Once a sequence has been mapped to a precise chromosomal location, the 
physical position of the sequence on the chromosome can be correlated with genetic 
map data. Such data are found, for example, in V. McKusick, Mendelian Inheritance 
in Man (publicly available on line via computer). The relationship between genes and 
S diseases that have been mapped to the same chromosomal region are then identified 
through linkage analysis (Co-Inheritance of Physically Adjacent Genes). 

The polypeptides, their fragments or other derivatives, or analogs thereof, or 
ceils expressing them can be used as an immunogen to produce antibodies thereto. 
These antibodies can be, for example, polyclonal or monoclonal antibodies. The 
10 present invention also includes chimeric, single chain, and humanized antibodies, as 
well as Fab fragments, or the product of an Fab expression library. Various procedures 
known in the art may be used for the production of such antibodies and fragments. 

Antibodies generated against the polypeptides corresponding to a sequence of 
the present invention can be obtained by direct injection of the polypeptides into an 
IS animal or by administering the polypeptides to an animal, preferably a north uman. The 
antibody so obtained will then bind the polypeptides itself. In this manner, even a 
sequence encoding only a fragment of the polypeptides can be used to generate 
antibodies binding the whole native polypeptides. Such antibodies can then be used to 
isolate the polypeptide from tissue expressing that polypeptide. 
20 For preparation of monoclonal antibodies, any technique which provides 

antibodies produced by continuous cell line cultures can be used. Examples include 
the hybridoma technique (Kohler and Milstein, 1975, Nature, 256:495-497), the trioma 
technique, the human B-cell hybridoma technique (Kozbor et al., 1983, Immunology 
Today 4:72), and the EBV-hybridoma technique to produce human monoclonal 
25 antibodies (Cole, et al., 1985, in Monoclonal Antibodies and Cancer Therapy, Alan R. 
Liss, Inc., pp. 77-96). 

Techniques described for the production of single chain antibodies (U.S. Patent 
4,946,778) can be adapted to produce single chain antibodies to immunogenic 
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polypepude products of this invention. Also, transgenic mice may be used to express 
humanized antibodies to immunogenic polypeptide products of this invention. 

The present invention will be further described with reference to the following 
examples; however, it is to be understood that the present invention is not limited to 
5 such examples. All parts or amounts, unless otherwise specified, are by weight. 

In order to facilitate understanding of the following examples certain 
frequently occurring methods and/or terms will be described. 

"Plasmids" are designated by a lower case p preceded and/or followed by 
capital letters and/or numbers. The starting plasmids herein are either commercially 
10 available, publicly available on an unrestricted basis, or can be constructed from 
available plasmids in accord with published procedures. In addition, equivalent 
plasmids to those described are known in the art and will be apparent to the ordinarily 
skilled artisan. 

"Digestion" of DNA refers to catalytic cleavage of the DNA with a restriction 
15 enzyme that acts only at certain sequences in the DNA. The various restriction 
enzymes used herein are commercially available and their reaction conditions, 
cofactors and other requirements were used as would be known to the ordinarily 
skilled artisan. For analytical purposes, typically 1 jig of plasmid or DNA fragment is 
used with about 2 units of enzyme in about 20 uJ of buffer solution. For the purpose of 
20 isolating DNA fragments for plasmid construction, typically 5 to 50 fig of DNA are 
digested with 20 to 250 units of enzyme in a larger volume. Appropriate buffers and 
substrate amounts for particular restriction enzymes are specified by the manufacturer. 
Incubation times of about 1 hour at 37°C are ordinarily used, but may vary in 
accordance with the supplier's instructions. After digestion the reaction is 
25 electrophoresed directly on a polyacrylamide gel to isolate the desired fragment. 

Size separation of the cleaved fragments is performed using 8 percent 
polyacrylamide gel described by Goeddel, D. et a/., Nucleic Acids Res., 8:4057 
(1980). 



27 



WO 97/33903 



PCT/US96/03239 



"Oligonucleotides" refers to either a single stranded polydeoxy nucleotide or 
two complementary polydeoxynucleotide strands which may be chemically 
synthesized. Such synthetic oligonucleotides have no 5' phosphate and thus will not 
ligate to another oligonucleotide without adding a phosphate with an ATP in the 
5 presence of a kinase. A synthetic oligonucleotide will ligate to a fragment that has not 
been dephosphorylated. 

"Ligation" refers to the process of forming phosphodiester bonds between two 
double stranded nucleic acid fragments (Maniatis, T„ et al. v Id., p. 146). Unless 
otherwise provided, ligation may be accomplished using known buffers and conditions 
10 with 10 units of T4 DNA ligase ("ligase") per 0.5 ^g of approximately equimolar 
amounts of the DNA fragments to be ligated. 

Unless otherwise stated, transformation was performed as described in the 
method of Graham, F. and Van der Eb, A., Virology, 52:456-457 (1973). 



15 Example 1 

Bacterial Expression and Purification of hMYH 

The DNA sequence encoding hMYH is initially amplified using PCR 
oligonucleotide primers corresponding to the 5* sequences of the processed hMYH 
protein (minus the signal peptide sequence) and the vector sequences 3' to the hMYH 

20 gene. Additional nucleotides corresponding to hMYH were added to the 5' and 3' 
sequences respectively. The 5' oligonucleotide primer has the sequence 5' 
CGCGGATCCGCCATCAIQACACCGCTCGTCTCC 3' (SEQ ID NO:3) contains a 
BamHI restriction enzyme site followed by 18 nucleotides of hMYH coding sequence 
starting from the presumed terminal amino acid of the processed protein codon. The 3' 

25 sequence 5' GCGfCTAGATCACTGGGCTGCACTGTTG 3' (SEQ ID NO:4) 
contains complementary sequences to Xbal site and is followed by 19 nucleotides of 
hMYH. The restriction enzyme sites correspond to the restriction enzyme sites on the 
bacterial expression vector pQE-9 (Qiagen, Inc. 9259 Eton Avenue, Chatsworth, CA, 
91311)). pQE-9 encodes antibiotic resistance (Amp), a bacterial origin of replication 
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(ori), an IPTG-regulatable promoter operator (P/O), a ribosome binding site (RBS), a 
6-His tag and restriction enzyme sites. pQE-9 then is digested with BamHI and Xbal. 
The amplified sequences are ligated into pQE-9 and were inserted in frame with the 
sequence encoding for the histidine tag and the RBS. The ligation mixture is then used 

5 to transform E, qo\i strain M15/rep 4 (Qiagen, Inc.) by the procedure described in 
Sambrook, J. et al„ Molecular Cloning: A Laboratory Manual, Cold Spring Laboratory 
Press, (1989). M15/rep4 contains multiple copies of the plasmid pREP4, which 
expresses the lacl repressor and also confers kanamycin resistance (Kan). 
Transformants are identified by their ability to grow on LB plates and 

10 ampicillin/kanamycin resistant colonies were selected. Plasmid DNA was isolated and 
confirmed by restriction analysis. Clones containing the desired constructs are grown 
overnight (O/N) in liquid culture in LB media supplemented with both Amp (100 
ug/ml) and Kan (25 ug/ml). The O/N culture is used to inoculate a large culture at a 
ratio of 1:100 to 1:250. The cells are grown to an optical density 600 (O.D. tt)0 ) of 

15 between 0.4 and 0.6. BPTG ("Isopropyl-B-D-thiogalacto pyranoside") is then added to 
a final concentration of 1 mM. IPTG induces by inactivating the lacl repressor, 
clearing the P/O leading to increased gene expression. Cells are grown an extra 3 to 4 
hours. Cells are then harvested by centrifiigation. The cell pellet is solubilized in the 
chaotropic agent 6 Molar Guanidine HC1. After clarification, solubilized hMYH is 

20 purified from this solution by chromatography on a Nickel-Chelate column under 
conditions that allow for tight binding by proteins containing the 6-His (histidine) tag 
(Hochuli, E. et al., J. Chromatography 41 1:177-184 (1984)). hMYH is eluted from the 
column in 6 molar guanidine HC1 pH 5.0 and for the purpose of renaturation adjusted 
to 3 molar guanidine HC1, lOOmM sodium phosphate, 10 mmolar glutathione 

25 (reduced) and 2 mmolar glutathione (oxidized). After incubation in this solution for 12 
hours the protein was dialyzed to 10 mmolar sodium phosphate. 
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Example 2 

Cloning and expression of hMYH using the bacuiovirus expression system 

The DNA sequence encoding the full length hMYH protein was amplified 
using PCR oligonucleotide primers corresponding to the 5' and 3* sequences of the 
5 gene: 

The 5' primer has the sequence 5' 

CGCGGATCCCGCAATCAIQACACCGCTCGTCTCC 3' (SEQ ID NO:5) and 
contains a BamHI restriction enzyme site (in bold) followed by 18 nucleotides 
resembling an efficient signal for the initiation of translation in eukaryotic cells 
10 (Kozak, M M J. Mol. Biol., 196:947-950 (1987) which is just behind the first 6 
nucleotides of the hMYH gene (the initiation codon for translation "ATG" is 
underlined). 

The 3' primer has the sequence 5' 

GCGTCTAGATCACTGGGCTGCACTGTTG 3' (SEQ ID NO:6) and contains the 

15 cleavage site for the restriction endonuclease Xbal and a number of nucleotides 
complementary to the J non-translated sequence of the hMYH gene sufficient for 
stable hybridization. The amplified sequences were isolated from a 1% agarose gel 
using a commercially available kit fGeneclean," BIO 101 Inc., La Jolla, Ca.). The 
fragment was then digested with the endonucleases BamHI and Xbal and then purified 

20 again on a 1 % agarose gel. This fragment is designated F2. 

The vector pA2 (modification of pVL941 vector, discussed below) is used for 
the expression of the hMYH protein using the bacuiovirus expression system (for 
review see: Summers, M.D. and Smith, G.E. 1987, A manual of methods for 
bacuiovirus vectors and insect cell culture procedures, Texas Agricultural 

25 Experimental Station Bulletin No. 1555). This expression vector contains the strong 
polyhedrin promoter of the Autographa califomica nuclear polyhedrosis virus 
(AcMNPV) followed by the recognition sites for the restriction endonucleases BamHI 
and Xbal. The polyadenylation site of the simian virus SV40 is used for efficient 
polyadenylation. For an easy selection of recombinant virus the beta-galactosidase 
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gene from Exoli is inserted in the same orientation as the polyhedrin promoter 
followed by the polyadenylation signal of the polyhedrin gene. The polyhedrin 
sequences are flanked at both sides by viral sequences for the cell-mediated 
homologous recombination of co-transfected wild-type viral DNA. Many other 

5 baculovims vectors could be used in place of pA2 such as pAc373, pVL94I, pRGl 
and pAcIMl (Luckow, V.A. and Summers, M.D., Virology, 170:31-39). 

The plasmid is digested with the restriction enzymes BamHI and Xbal and 
then dephosphorylated using calf intestinal phosphatase by procedures known in the 
art. The DNA is then isolated from a 1% agarose gel using the commercially available 

10 kit ("Geneclean" BIO 101 Inc., La Jolla, Ca.). This vector DNA is designated V2. 

Fragment F2 and the dephosphorylated plasmid V2 were ligated with T4 DNA 
ligase. E.coli HB101 cells is then transformed and bacteria identified that contained 
the plasmid (pBachMYH) with the hMYH gene using the enzymes BamHI and Xbal. 
The sequence of the cloned fragment is confirmed by DNA sequencing. 

15 5 \ig of the plasmid pBachMYH is co-transfected with 1.0 \ig of a 

commercially available linearized baculovims ("BaculoGold™ baculovims DNA", 
Pharmingen, San Diego, CA.) using the lipofection method (Feigner et al. Proc. Natl. 
Acad. Sci. USA, 84:7413-7417 (1987)). 

lUg of BaculoGold™ virus DNA and 5 ^g of the plasmid pBachMYH are 

20 mixed in a sterile well of a microtiter plate containing 50 fxl of seium free Grace s 
medium (Life Technologies Inc., Gaithersburg, MD). Afterwards 10 ^1 Lipofectin 
plus 90 \xl Grace s medium are added, mixed and incubated for 15 minutes at room 
temperature. Then the transfection mixture is added drop-wise to the Sf9 insect cells 
(ATCC CRL 171 1) seeded in a 35 mm tissue culture plate with 1 ml Grace's medium 

25 without serum. The plate is rocked back and forth to mix the newly added solution. 
The plate is then incubated for 5 hours at 27°C After 5 hours the transfection solution 
is removed from the plate and 1 ml of Grace's insect medium supplemented with 10% 
fetal calf serum is added. The plate is put back into an incubator and cultivation 
continued at 27°C for four days. 
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After four days the supernatant is collected and a plaque assay performed 
similar as described by Summers and Smith (supra). As a modification an agarose gel 
with "Blue GaT (Life Technologies Inc., Gaithersburg) is used which allows an easy 
isolation of blue stained plaques. (A detailed description of a "plaque assay" can also 
5 be found in the user's guide for insect cell culture and baculovirology distributed by 
Life Technologies Inc., Gaithersburg, page 9-10). 

Four days after the serial dilution, the virus is added to the cells and blue 
stained plaques are picked with the tip of an Eppendorf pipette. The agar containing 
the recombinant viruses is then resuspended in an Eppendorf tube containing 200 ^1 of 
10 Grace's medium. The agar is removed by a brief centrifugation and the supernatant 
containing the recombinant baculovirus is used to infect Sf9 cells seeded in 35 mm 
dishes. Four days later the supernatants of these culture dishes are harvested and then 
stored at 4°C. 

Sf9 cells are grown in Grace's medium supplemented with 10% heat- 
15 inactivated FBS. The cells are infected with the recombinant baculovirus V-hMYH at 
a multiplicity of infection (MOI) of 2. Six hours later the medium is removed and 
replaced with SF900 II medium minus methionine and cysteine (Life Technologies 
Inc., Gaithersburg). 42 hours later 5 \iC\ of M S -methionine and 5 \iC\ M S cysteine 
(Amersham) are added. The cells are further incubated for 16 hours before they are 
20 harvested by centrifugation and the labelled proteins visualized by SDS-PAGE and 
autoradiography. 

Example 3 

Expression of Recombinant hMYH in COS ceils 

25 The expression of plasmid, hMYH HA is derived from a vector pcDNAI/Amp 

(Invitrogen) containing: 1) SV40 origin of replication, 2) ampicillin resistance gene, 
3) Exoli replication origin, 4) CMV promoter followed by a polylinker region, an 
SV40 intron and polyadenylation site. A DNA fragment encoding the entire hMYH 
precursor and a HA tag fused in frame to its 3* end was cloned into the polylinker 
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region of the vector, therefore, the recombinant protein expression is directed under the 
CMV promoter. The HA tag corresponds to an epitope derived from the influenza 
hemagglutinin protein as previously described (I. Wilson, H. Niman, R. Heighten, A 
Cherenson, M. Connolly, and R. Lemer, 1984, Cell 37:767, (1984)). The infusion of 

5 HA tag to the target protein allows easy detection of the recombinant protein with an 
antibody that recognizes the HA epitope. 

The plasmid construction strategy is described as follows: 
The DNA sequence encoding hMYH is constructed by PCR using two 
primers: the 5' primer 5 , -CGCGGATCCCX:CATCATGACACCGCTCGTCTCC-3 , 

10 (SEQ ID NO:7) contains a BamHI site followed by 18 nucleotides of hMYH coding 
sequence starting from the initiation codon; the 3' sequence 5'- 
GCGCTCGAGCTGGGCTGCACTGTTGAGG (SEQ ID NO: 8) contains 
complementary sequences to Xhol site, translation stop codon, HA tag and the last 19 
nucleotides of the hMYH coding sequence (not including the stop codon). Therefore, 

15 the PCR product contains a BamHI site, hMYH coding sequence and an Xhol site. 
The PCR amplified DNA fragment and the vector, pcDNAI/Amp (comprising an HA 
tag at the 3' end), are digested with BamHI and Xhol restriction enzyme and ligated. 
The ligation mixture is transfoimed into E. coli strain SURE (available from 
Stratagene Cloning Systems, La Jolla) the transformed culture is plated on ampicillin 

20 media plates and resistant colonies are selected. Plasmid DNA is isolated from 
transformants and examined by restriction analysis for the presence of the correct 
fragment. For expression of the recombinant hMYH, COS cells are transfected with 
the expression vector by DEAE-DEXTRAN method (J. Sambrook, E. Fritsch, T. 
Maniatis, Molecular Cloning: A Laboratory Manual, Cold Spring Laboratory Press, 

25 (1989)). The expression of the hMYH HA protein is detected by radiolabelling and 
immunoprecipitation method (E. Harlow, D. Lane, Antibodies: A Laboratory Manual, 
Cold Spring Harbor Laboratory Press, (1988)). Cells are labelled for 8 hours with 35 S- 
cysteine two days post transfection. Culture media is then collected and cells are lysed 
with detergent (RIPA buffer (150 mM NaCl, 0.1% SDS, 1% NP-40, 0.5% DOC, 
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50mM Tris, pH 7.5) (Wilson, I. et al., Id. 37:767 (1984)). Both cell lysate and culture 
media are precipitated with an HA specific monoclonal antibody. Proteins precipitated 
are analyzed on 15% SDS-PAGE gels. 

5 Example 4 

Expression via Gene Therapy 

Fibroblasts are obtained from a subject by skin biopsy. The resulting tissue is 
placed in tissue-culture medium and separated into small pieces. Small chunks of the 
tissue are placed on a wet surface of a tissue culture flask, approximately ten pieces are 

10 placed in each flask. The flask is turned upside down, closed tight and left at room 
temperature over night. After 24 hours at room temperature, the flask is inverted and 
the chunks of tissue remain fixed to the bottom of the flask and fresh media (e.g., 
Ham's F12 media, with 10% FBS, penicillin and streptomycin, is added. This is then 
incubated at 37°C for approximately one week. At this time, fresh media is added and 

15 subsequently changed every several days. After an additional two weeks in culture, a 
monolayer of fibroblasts emerge. The monolayer is trypsinized and scaled into larger 
flasks. 

pMV-7 (Kirschmeier, P.T. et al, DNA, 7:219-25 (1988) flanked by the long 
terminal repeats of the Moloney murine sarcoma vims, is digested with EcoRI and 
20 Hindm and subsequently treated with calf intestinal phosphatase. The linear vector is 
fractionated on agarose gel and purified, using glass beads. 

The cDNA encoding a polypeptide of the present invention is amplified using 
PCR primers which correspond to the 5* and 3* end sequences respectively. The 5' 
primer containing an EcoRI site and the 3' primer further includes a Hindm site. 
25 Equal quantities of the Moloney murine sarcoma virus linear backbone and the 
amplified EcoRI and Hindlfl fragment are added together, in the presence of T4 DNA 
ligase. The resulting mixture is maintained under conditions appropriate for ligation of 
the two fragments. The ligation mixture is used to transform bacteria HB101, which 
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are then plated onto agar-containing kanamycin for the purpose of confirming that the 
vector had the gene of interest properly inserted. 

The amphotropic pA317 or GP+aml2 packaging cells are grown in tissue 
culture to confluent density in Dulbecco's Modified Eagles Medium (DMEM) with 
5 10% calf serum (CS), penicillin and streptomycin. The MSV vector containing the 
gene is then added to the media and the packaging cells are transduced with the vector. 
The packaging cells now produce infectious viral particles containing the gene (the 
packaging cells are now referred to as producer cells). 

Fresh media is added to the transduced producer cells, and subsequently, the 
10 media is harvested from a 10 cm plate of confluent producer cells. The spent media, 
containing the infectious viral particles, is filtered through a millipore filter to remove 
detached producer cells and this media is then used to infect fibroblast cells. Media is 
removed from a sub-confluent plate of fibroblasts and quickly replaced with the media 
from the producer cells. This media is removed and replaced with fresh media. If the 
15 titer of virus is high, then virtually all fibroblasts will be infected and no selection is 
required. If the titer is very low, then it is necessary to use a retroviral vector that has a 
selectable marker, such as usq or ly£. 

The engineered fibroblasts are then injected into the host, either alone or after 
having been grown to confluence on cytodex 3 microcarrier beads. The fibroblasts 
20 now produce the protein product. 

Numerous modifications and variations of the present invention are possible in 
light of the above teachings and, therefore, within the scope of the appended claims, 
the invention may be practiced otherwise than as particularly described. 
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(2) INFORMATION FOR SEQ ID NO: 1 : 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 1811 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 

CTAGTTCAGG CGGAAGG AGC AGTCCTCTGA AGCTTGAGGA GCCTCTAGAA CTATGAGCCC 
60 

GAGGCCTTCC CCTCTCCCAG AGCGCAGAGG CTTTGAAGGC TACCTCTGGG AAGCCGCTCA 
120 

CCGTCGGAAG CTGCGGGAGC TGAAACTGCG CCATCGTCAC TGTCGGCGGC C ATG ACA 
177 Met Thr 



CCG CTC GTC TCC CGC CTG AGT CGT CTG TGG GCC ATC ATG AGG AAG CCA 225 
Pro Leu Val Ser Arg Leu Ser Arg Leu Tip Ala lie Met Arg Lys Pro 

5 10 15 

CGA GCA GCC GTG GGA AGT GGT CAC AGG AAG CAG GCA GCC AGC CAG GAA 273 
Arg Ala Ala Val Gly Scr Gly His Arg Lys Gin Ala Ala Ser Gin Glu 

20 25 30 

GGG AGG CAG AAG CAT GCT AAG AAC AAC AGT CAG OTC AAG CCT TCT GCC 32 1 
Gly Arg Gin Lys His Ala Lys Asn Asn Ser Gin Ala Lys Pro Ser Ala 
35 40 45 50 

TGT GAT GGC CTG GCC AGG CAG CCG GAA GAG GTG GTA TTG CAG GCC TCT 369 
Cys Asp Gly Leu Ala Arg Gin Pro Glu Glu Val Val Leu Gin Ala Ser 

55 60 65 

GTC TCC TC A TAC CAT CTA TTC AG A G AC GTA GCT GAA GTC ACA GCC TTC 4 1 7 
Val Ser Ser Tyr His Leu Phe Arg Asp Val Ala Glu Val Thr Ala Phe 

70 75 80 

CGA GGG AGC CTG CTA AGC TGG TAC G AC CAA GAG AAA CGG GAC CTA CCA 465 
Arg Gly Ser Leu Leu Ser Trp Tyr Asp Gin Glu Lys Arg Asp Leu Pro 

85 90 95 

TGG AGA AG A CGG GCA GAA GAT GAG ATG GAC CTG GAC AGG CGG GCA TAT 5 1 3 
Trp Arg Arg Arg Ala Glu Asp Glu Met Asp Leu Asp Arg Arg Ala Tyr 

100 105 no 

GCT GTG TGG GTC TC A GAG GTC ATG CTG CAG CAG ACC CAG GTT GCC ACT 56 1 
Ala Val Trp Val Ser Glu Val Met Leu Gin Gin Thr Gin Val Ala Thr 
115 120 125 130 

GTG ATC AAC TAC TAT ACC GGA TGG ATG CAG AAG TGG CCT ACA CTG CAG 609 
Val He Asn Tyr Tyr Thr Gly Trp Met Gin Lys Trp Pro Thr Leu Gin 

135 140 145 

GAC CTG GCC AGT GCT TCC CTG GAG GAG GTG AAT CAA CTC TGG GCT GGC 657 
Asp Leu Ala Ser Ala Ser Leu Glu Glu Val Asn Gin Leu Trp Ala Gly 

150 155 160 
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CTG GGC TAC TAT TCT CGT GGC CGG CGG CTG CAG GAG GGA GCT CGG AAG 705 
Leu Gly Tyr Tyr Ser Arg Gly Arg Arg Leu Gin Glu Gly Ala Arg Lys 

165 170 175 

GTG GTA GAG GAG CTA GGG GGC CAC ATG CCA CGT ACA GCA GAG ACC CTG 753 
Val Vai Glu Glu Leu Gly Gly His Met Pro Arg Thr Ala Glu Thr Leu 

180 185 190 

CAG CAG CTC CTG CCT GGC GTG GGG CGC TAC ACA GCT GGG GCC ATT GCC 801 
Gin Gin Leu Leu Pro Gly Val Gly Arg Tyr Thr Ala Gly Ala lie Ala 
195 200 205 210 

TCT ATC GCC TTT GGC CAG GCA ACC GOT GTG GTG GAT GGC AAC GTA GCA 849 
Ser He Ala Phc Gly Gin Ala Thr Gly Val Val Asp Gly Asn Val Ala 

215 220 225 

CGG GTG CTG TGC CGT GTC CG A GCC ATT GGT GCT GAT CCC AGC AGC ACC 849 
Arg Val Leu Cys Arg Val Arg Ala He Gly Ala Asp Pro Ser Ser Thr 

230 235 240 

CTT GTT TCC CAG CAG CTC TGG GGT CTA GCC CAG CAG CTG GTG GAC CCA 897 
Leu Val Ser Gin Gin Leu Trp Gly Leu Ala Gin Gin Leu Val Asp Pro 

245 250 255 

GCC CGG CCA GGA GAT TTC AAC CAA GCA GCC ATG GAG CTA GGG GCC ACA 945 
Ala Arg Pro Gly Asp Phe Asn Gin Ala Ala Met Glu Leu Gly Ala Thr 

260 265 270 

GTG TGT ACC CCA CAG CGC CCA CTG TGC AGC CAG TGC CCT GTG GAG AGC 993 
Val Cys Thr Pro Gin Arg Pro Leu Cys Ser Gin Cys Pro Val Glu Ser 
275 280 285 290 

CTG TGC CGG GCA CGC CAG AGA GTG GAG CAG GAA CAG CTC TTA GCC TCA 1041 
Leu Cys Arg Ala Arg Gin Arg Val Glu Gin Glu Gin Leu Leu Ala Ser 

295 300 305 

GGG AGC CTG TCG GGC AGT CCT GAC GTG GAG GAG TGT GCT CCC AAC ACT 1089 
Gly Ser Leu Ser Gly Ser Pro Asp Val Glu Glu Cys Ala Pro Asn Thr 

310 315 320 

GGA CAG TGC CAC CTG TGC CTG CCT CCC TCG GAG CCC TGG GAC CAG ACC 1 137 
Gly Gin Cys His Leu Cys Leu Pro Pro Ser Glu Pro Trp Asp Gin Thr 

325 330 335 

CTG GGA GTG GTC AAC TTC CCC AGA AAG GCC AGC CGC AAG CCC CCC AGG 1 1 85 
Leu Gly Val Val Asn Phe Pro Arg Lys Ala Ser Arg Lys Pro Pro Arg 

340 345 350 

GAG GAG AGC TCT GCC ACC TGT GTT CTG GAA CAG CCT GGG GCC CTT GGG 1 233 
Glu Glu Ser Ser Ala Thr Cys Val Leu Glu Gin Pro Gly Ala Leu Gly 
355 360 365 370 

GCC CAA ATT CTG CTG GTG CAG AGG CCC AAC TCA GGT CTG CTG GCA GGA 1281 
Ala Gin lie Leu Leu Val Gin Arg Pro Asn Ser Gly Leu Leu Ala Gly 

375 380 385 

CTG TGG GAG TTC CCG TCC GTG ACC TGG GAG CCC TCA GAG CAG CTT CAG 1 329 
Leu Trp Glu Phe Pro Ser Val Thr Trp Glu Pro Ser Glu Gin Leu Gin 

390 395 400 

CGC AAG GCC CTG CTG CAG GAA CTA CAG CGT TGG GCT GGG CCC CTC CCA 1 377 
Arg Lys Ala Leu Leu Gin Glu Leu Gin Arg Trp Ala Gly Pro Leu Pro 

405 410 415 

GCC ACG CAC CTC CGG CAC CTT GGG GAG GTT GTC CAC ACC TTC TCT CAC 1425 
Ala Thr His Leu Arg His Leu Gly Glu Val Val His Thr Phe Ser His 

420 425 430 

ATC AAG CTG ACA TAT CAA GTA TAT GGG CTG GCC TTG GAA GGG CAG ACC 1 473 
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He Lys Leu Thr Tyr Gin Val Tyr Gly Leu Ala Leu Glu Gly Gin Thr 
435 440 445 450 

CCA GTG ACC ACC GTA CCA CCA GGT GCT CGC TGG CTG ACG CAG GAG GAA 1521 
Pro Val Thr Thr Val Pro Pro Gly Ala Arg Tip Leu Thr Gin Glu Glu 

455 460 465 

TTT CAC ACC GC A GCT GTT TCC ACC GCC ATG AAA AAG GTT TTC CGT GTG 1 569 
Phe His Thr Ala Ala Val Ser Thr Ala Met Lys Lys Val Phe Arg Val 

470 475 480 

TAT CAG GGC CAA CAG CCA GGG ACC TGT ATG GGT TCC AAA AGG TCC CAG 1617 
Tyr Gin Gly Gin Gin Pro Gly Thr Cys Met Gly Ser Lys Arg Ser Gin 

485 490 495 

GTG TCC TCT CCG TGC AGT CGG AAA AAG CCC CGC ATG GGC CAG CAA GTC 1 665 
Val Ser Ser Pro Cys Ser Arg Lys Lys Pro Arg Met Gly Gin Gin Val 

500 505 510 

CTG GAT AAT TTC TTT CGG TCT CAC ATC TCC ACT GAT GCA CAC AGC CTC 1713 
Leu Asp Asn Phe Phe Arg Ser His He Ser Thr Asp Ala His Ser Leu 
515 520 525 530 

AAC AGT GCA GCC CAG TGA CACCTCTGAA AGCCCCCATT CCCTGAGAAT 1 761 
Asn Ser Ala Ala Gin* 
535 

CCTGTTGTTA GTAAAGTGCT TATTTTTGTA GTTAAAAAAA AAAAAAAAA 1811 



(2) INFORMATION FOR SEQ ID NO:2: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 535 AMINO ACIDS 

(B) TYPE: AMINO ACID 

(C) STRANDEDNESS: 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: PROTEIN 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 
Met Thr Pro Leu Val Ser Arg Leu Ser Arg Leu Trp Ala He Met 

5 10 15 

Arg Lys Pro Arg Ala Ala Val Gly Ser Gly His Arg Lys Gin Ala 

20 25 30 

Ala Ser Gin Glu Gly Arg Gin Lys His Ala Lys Asn Asn Ser Gin 

35 40 45 

Ala Lys Pro Ser Ala Cys Asp Gly Leu Ala Arg Gin Pro Glu Glu 
50 55 60 

Val Val Leu Gin Ala Ser Val Ser Ser Tyr His Leu Phe Arg Asp 

65 70 75 

Val Ala Glu Val Thr Ala Phe Arg Gly Ser Leu Leu Ser Trp Tyr 

80 85 90 

Asp Gin Glu Lys Arg Asp Leu Pro Trp Arg Arg Arg Ala Glu Asp 

95 100 105 

Glu Met Asp Leu Asp Arg Arg Ala Tyr Ala Val Trp Val Ser Glu 

110 115 120 

Val Met Leu Gin Gin Thr Gin Val Ala Thr Val He Asn Tyr Tyr 
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125 130 135 

Thr Gly Trp Met Gin Lys Trp Pro Thr Leu Gin Asp Leu Ala Ser 
140 145 150 

Ala Ser Leu Glu Glu Val Asn Gin Leu Trp Ala Gly Leu Gly Tyr 
155 160 165 

Tyr Ser Arg Gly Arg Arg Leu Gin Glu Gly Ala Arg Lys Val Val 
170 175 180 

Glu Glu Leu Gly Gly His Met Pro Arg Thr Ala Glu Thr Leu Gin 
185 190 195 

Gin Leu Leu Pro Gly Val Gly Arg Tyr Thr Ala Gly Ala lie Ala 
200 205 210 

Ser He Ala Phe Gly Gin Ala Thr Gly Val Val Asp Gly Asn Val 
215 220 225 

Ala Arg Val Leu Cys Arg Val Arg Ala lie Gly Ala Asp Pro Ser 
230 235 240 

Ser Thr Leu Val Ser Gin Gin Leu Trp Gly Leu Ala Gin Gin Leu 
245 250 255 

Val Asp Pro Ala Arg Pro Gly Asp Phe Asn Gin Ala Ala Met Glu 
260 265 270 

Leu Gly Ala Thr Val Cys Thr Pro Gin Arg Pro Leu Cys Ser Gin 
275 280 285 

Cys Pro Val Glu Ser Leu Cys Arg Ala Arg Gin Arg Val Glu Gin 
290 295 300 

Glu Gin Leu Leu Ala Ser Gly Ser Leu Ser Gly Ser Pro Asp Val 
305 310 315 

Glu Glu Cys Ala Pro Asn Thr Gly Gin Cys His Leu Cys Leu Pro 
320 325 330 

Pro Ser Glu Pro Trp Asp Gin Thr Leu Gly Val Val Asn Phe Pro 
335 340 345 

Arg Lys Ala Ser Arg Lys Pro Pro Arg Glu Glu Ser Ser Ala Thr 
350 355 360 

Cys Val Leu Glu Gin Pro Gly Ala Leu Gly Ala Gin He Leu Leu 
365 370 375 

Val Gin Arg Pro Asn Ser Gly Leu Leu Ala Gly Leu Trp Glu Phe 
380 385 390 

Pro Ser Val Thr Trp Glu Pro Ser Glu Gin Leu Gin Arg Lys Ala 
395 400 405 

Leu Leu Gin Glu Leu Gin Arg Trp Ala Gly Pro Leu Pro Ala Thr 
410 415 420 

His Leu Arg His Leu Gly Glu Val Val His Thr Phe Ser His He 
425 430 435 

Lys Leu Thr Tyr Gin Val Tyr Gly Leu Ala Leu Glu Gly Gin Thr 
440 445 450 

Pro Val Thr Thr Val Pro Pro Gly Ala Arg Trp Leu Thr Gin Glu 
455 460 465 

Glu Phe His Thr Ala Ala Val Ser Thr Ala Met Lys Lys Val Phe 
470 475 480 

Arg Val Tyr Gin Gly Gin Gin Pro Gly Thr Cys Met Gly Ser Lys 
485 490 495 

Arg Ser Gin Val Ser Ser Pro Cys Ser Arg Lys Lys Pro Arg Met 
500 505 510 

Gly Gin Gin Val Leu Asp Asn Phe Phe Arg Ser His He Ser Thr 
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515 520 525 

Asp Ala His Ser Leu Asn Ser Ala Ala Gin 
530 535 

(2) INFORMATION FOR SEQ ID NO:3: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH. 34 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQIDNO:3: 
CGCGGATCCG CCATCATTGA CACCGCTCGT CTCC 34 



(2) INFORMATION FOR SEQ ID NO:4: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 28 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQIDNO:4: 
GCGTCTAGAT CACTGGGCTG CACTGTTG 28 



(2) INFORMATION FOR SEQ ID NO:5: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 34 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:5: 
CGCGGATCCC GCAATCATGA CACCGCTCGT CTCC 34 



(2) INFORMATION FOR SEQ ID NO:6: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 28 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQIDNO:6: 
GCGTCTAGAT CACTGGGCTG CACTGTTG 28 



(2) INFORMATION FOR SEQ ID NO:7: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQIDNO:7: 



(2) INFORMATION FOR SEQ ID NO:8: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:8: 



(2) INFORMATION FOR SEQ ID NO:9: 

(i) SEQUENCE CHARACTERISTICS 
(A) LENGTH: 355 AMINO ACIDS 
(B> TYPE: AMINO ACID 

(C) STRANDEDNESS: 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: PROTEIN 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: 

Met Gin Ala Ser Gin Phe Ser Ala Gin Val Leu Asp Trp Tyr Asp 

5 10 15 

Lys Tyr Gly Arg Lys Thr Leu Pro Trp Gin lie Asp Lys Thr Pro 

20 25 30 

Tyr Lys Val Trp Leu Se Glu Val Met Leu Gin Gin Thr Gin Val 

35 40 45 

Ala Thr Val He Pro Tyr Phe Glu Arg Phe Met Ala Arg Phe Pro 
50 55 60 

Thr Val Thr Asp Leu Ala Asn Ala Pro Leu Asp Glu Val Leu His 

65 70 75 

Leu Trp Thr Gly Leu Gly Tyr Tyr Ala Arg Ala Arg Asn Leu His 

80 85 90 

Lys Ala Ala Gin Gin Val Ala Thr Leu His Gly Gly Lys Phe Pro 

95 100 105 

Glu Thr Phe Glu Glu Val Ala Ala Leu Pro Gly Val Gly Arg Ser 

HO 115 120 

Thr Ala Gly Ala He Leu Ser Leu Ser Leu Gly Lys His Phe Pro 
125 130 135 

He Leu Asp Gly Asn Val Lys Arg Val Leu Ala Arg Cys Tyr Ala 

140 145 150 

Val Ser Gly Trp Pro Gly Lys Lys Glu Val Glu Asn Lys Leu Trp 
155 160 165 

Ser Leu Ser Glu Gin Val Thr Pro Ala Val Gly Val Glu Arg Phe 

170 175 180 

Asn Gin Ala Met Met Asp Leu Gly Ala Met lie Cys Thr Arg Ser 

185 190 195 

Lys Pro Lys Cys Ser Leu Cys Pro Leu Gin Asn Gly Cys He Ala 

200 205 210 

Ala Ala Asn Asn Ser Trp Ala Leu Tyr Pro Gly Lys Lys Pro Lys 

215 220 225 

Gin Thr Leu Pro Glu Arg Thr Gly Tyr Phe Leu Leu Leu Gin His 

230 235 240 

Glu Asp Glu Val Leu Leu Ala Gin Arg Pro Pro Ser Gly Leu Trp 

245 250 255 

Gly Gly Leu Tyr Cys Phe Pro Gin Phe Ala Asp Glu Glu Ser Leu 
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260 265 270 

Arg Gin Tip Leu Ala Gin Arg Gin lie Ala Ala Asp Asn Leu Thr 

275 280 285 

Gin Leu Thr Ala Phe Arg His Thr Phe Ser His Phe His Leu Asp 

290 295 300 

He Val Pro Met Tip Leu Pro Val Ser Ser Phe Thr Gly Cys Met 

305 310 315 

Asp Glu Gly Asn Ala Leu Trp Tyr Asn Leu Ala Gin Pro Pro Ser 

320 325 330 

Val Gly Leu Ala Ala Pro Val Glu Arg Leu Leu Gin Gin Leu Arg 

335 340 350 

Thr Gly Ala Pro Val 

355 



(2) INFORMATION FOR SEQ ID NO: 1 0: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 30 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
TCCTCTGAAG CTTGAGGAGC CTCTAGAACT 30 



(2) INFORMATION FOR SEQ ID NO:ll: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 25 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
TAGCTCCATG GCTGCTTGGT TGAAA 25 
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(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 25 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
GCCATCATGA GGAAGCCACG AGCAG 25 
(2) INFORMATION FOR SEQ ID NO: 1 3: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 25 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
TAGCTCCATG GCTGCTTGGT TGAAA 25 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 22 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
TTGACCCGAA ACTGCTGAAT AG 22 
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(2) INFORMATION FOR SEQ ID NO: 1 5: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 25 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
CAGTGGAGAT GTGAGACCGA AAGAA 25 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 25 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

CAGCCCGGCC AGGAGATTTC AACCA 25 



(2) INFORMATION FOR SEQ ID NO: 1 7: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 25 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQIDNO:17: 

CAGTGGAGAT GTGAGACCGA AAGAA 25 
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(2) INFORMATION FOR SEQ ID NO: 1 8: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 26 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
CCCTCACTAA AGGGAACAAA AGCTGG 
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WHAT IS CLAIMED IS : 

1 . An isolated polynucleotide comprising a polynucleotide which is at least 70% 
identical to a member selected from the group consisting of: 

5 (a) a polynucleotide encoding a polypeptide comprising amino acid 1 to amino 
acid 535 set forth in SEQ ID NO:2; 

(b) a polynucleotide which is complementary to the polynucleotide of (b); and 

(c) a polynucleotide comprising at least 30 consecutive bases of the 
polynucleotide of (a) or (b). 

10 

2. The polynucleotide of Claim 1 wherein the polynucleotide is DNA. 

3. The polynucleotide of Claim 1 wherein the polynucleotide is RNA. 

15 4. The polynucleotide of Claim 1 wherein the polynucleotide is genomic DNA. 

5. The polynucleotide of Claim 2 which encodes the polypeptide comprising amino 
acid 1 to 535 of SEQ ID NO:2. 

20 6. The polynucleotide of claim 2 comprising the sequence as set forth in SEQ ID 
NO: I from nucleotide 1 to nucleotide 1811. 

7. The polynucleotide of claim 2 comprising the sequence as set forth in SEQ ID 
No. 1 from nucleotide 172 to nucleotide 1729. 

8. An isolated polynucleotide comprising a polynucleotide having at least a 70% 
identity to a member selected from the group consisting of: 

(a) a polynucleotide which is complementary to the polynucleotide of (a); 

and 

(b) a polynucleotide comprising at least 30 consecutives bases of the 
polynucleotide of (a). 
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9. The isolated polynucleotide of claim 8 wherein said polynucleotide is the 
polynucleotide which expresses hMYH. 

5 10. An isolated polynucleotide comprising a member selected from the group 
consisting of: 

(a) DNA having at least 15 consecutive bases and which is at least 70% 
complementary to a member selected from the group consisting of: 

(i) DNA comprising at least 15 consecutive bases selected from the group 
10 consisting of nucleotide 1 to nucleotide 1 8 1 1 of SEQ ID NO: 1 ; 

(ii) DNA complementary to (i); and 

(b) RNA corresponding to the DNA of (a). 

1 1. A vector comprising the DNA of Claim 2. 

15 

12. A host cell genetically engineered with the vector of Claim 1 1. 

13. A process for using the host cell of Claim 12 comprising: expressing from the 
host cell a polypeptide encoded by DNA contained in the vector. 

20 

14. A process for using a cell comprising: genetically engineering the cell with the 
vector of claim 1 1 such that the cell expresses a polypeptide encoded by the DNA 
contained in said vector. 

25 15. A polypeptide comprising a member selected from the group consisting of: 

(a) a polypeptide having an amino acid sequence set forth in SEQ ID NO:2; and 

(b) a polypeptide which is at least 70% identical to the polypeptide of (a). 

16. The polypeptide of Claim 15 wherein the polypeptide comprises amino acid 1 to 
30 amino acid 535 of SEQ ID NO:2. 
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17. A method for the treatment of a patient having need of hMYH comprising: 
administering to the patient a therapeutically effective amount of the polypeptide of 
claim 15 by providing to the patient DNA encoding said polypeptide and expressing 
said polypeptide in vivo. 

5 

18. A diagnostic process comprising: 

analyzing for the presence of the polypeptide of claim 1 5 in a sample derived from a 
host. 

10 19. A process for diagnosing a susceptibility to cancer comprising: 

determining from a sample derived from a patient a mutation in the polynucleotide 
sequence of claim 1. 

20. A process for diagnosing a cancer comprising: 

15 determining from a sample derived from a patient a decreased level of activity of a 
polypeptide having the sequence of claim 15. 

21 . A process for diagnosing a cancer comprising: 

determining from a sample derived from a patient a decreased level of expression of 
20 a polypeptide having the sequence of claim 15. 

22. A process for diagnosing a cancer comprising: 

determining from a sample derived from a patient a decreased level of expression of 
a polynucleotide having the sequence of claim 1 . 

25 

23. A process for diagnosing cancer comprising: 

determining from a sample derived from a patient a mutation in the polynucleotide 
sequence of claim 1 . 

30 24. An antibody against a polypeptide of claim 15. 
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TCTCCTCG 7GGXC TAGTT CAGGC GGAAG GAGCA GTCCT CTGAA GCTTG -183 

AGGAG CCTCT AGAAC TATGA GCCCG AGGCC TTCCC CTCTC CCAGA -135 

GCGCA GAGGC TTTGA AGGCT ACCTC TGGGA AGCCG CTCAC CGTCG -90 

GAAGC TGCGG GAGCT GAAAC TGCGC CATCG TCACT GTCGG CGGCC -45 
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